The Third Manifesto, StreamSQL, Weblogs, and Logfiles

Mark Leighton Fisher on 2007-07-13T17:00:03

[This needs more thought, but here is a start...]

The Third Manifesto, StreamSQL, Weblogs, and Logfiles – all of these address what I think is a deeper ur-problem, the problem of temporally-bounded data. Ofttimes you can just consider some data as true for that moment, and that is all you need (whether the oncoming stoplight is at "stop" or "go"). When you must deal with data over a period of time, then you must consider when that data is valid and when it is invalid.

There are many ways of categorizing your data – you can place it in one or more hierarchies (remember, There Is No One Ontology), you can tag it with one or more keywords (whether formally or informally defined), you can arrange the data by its date characteristic(s), etc. I think weblogs have become popular in part due to their format – the most recent items are at the start of the weblog. Sapients are geared to pay more attention to immediate needs ("where's some food?", "how can I get out of this freezing rain?"), so arranging items on a webpage by descending date/time feels natural. Many of you work in fields where knowledge is constantly increasing and changing – the latest developments from 1995 will be of less immediate interest than the latest developments of today.

"What is important" and "What is urgent" are the two questions you ask when setting priorities. "What is important" can be a question for the ages. "What is urgent", i.e. what must be handled now or forgotten forever (more than likely) can be at least partially automated by using temporally-bounded data (you can also ask "What was important THEN", a key question in historial research ("History does not repeat itself, but it sure does rhyme."))

Whether you are tracking financial data, blood glucose of your patients (or yourself), or figuring out which candidate to vote for based on their historial record, handling the temporal validity boundaries (now there's a mouthful) of your data is vital.

As I see it, computer users (both technical and non-technical) are becoming more aware of the temporal aspects of their data. There are few truly eternal verities (no matter what your philosophy, the Earth will not endure as it now is without intervention of some sort), so working with data that is true for only part of the time is part and parcel of dealing with data in general. As our systems become more capable and complex, they expand into regions where you cannot ignore the time aspects of your data anymore.